factbook<-read_delim("factbook.csv",delim=";",col_types = cols(
  Country=col_character(),
  `Area(sq km)`=col_double(),
  `Birth rate(births/1000 population)`=col_double(),
  `Current account balance`=col_double(),
  `Death rate(deaths/1000 population)`=col_double(),
  `Debt - external`=col_double(),
  `Electricity - consumption(kWh)`=col_double(),
  `Electricity - production(kWh)`=col_double(),
  `Exports`=col_double(),
  `GDP`=col_double(),
  `GDP - per capita`=col_double(),
  `GDP - real growth rate(%)`=col_double(),
  `HIV/AIDS - adult prevalence rate(%)`=col_double(),
  `HIV/AIDS - deaths`=col_double(),
  `HIV/AIDS - people living with HIV/AIDS`=col_double(),
  `Highways(km)`=col_double(),
  `Imports`=col_double(),
  `Industrial production growth rate(%)`=col_double(),
  `Infant mortality rate(deaths/1000 live births)`=col_double(),
  `Inflation rate (consumer prices)(%)`=col_double(),
  `Internet hosts`=col_double(),
  `Internet users`=col_double(),
  `Investment (gross fixed)(% of GDP)`=col_double(),
  `Labor force`=col_double(),
  `Life expectancy at birth(years)`=col_double(),
  `Military expenditures - dollar figure`=col_double(),
  `Military expenditures - percent of GDP(%)`=col_double(),
  `Natural gas - consumption(cu m)`=col_double(),
  `Natural gas - exports(cu m)`=col_double(),
  `Natural gas - imports(cu m)`=col_double(),
  `Natural gas - production(cu m)`=col_double(),
  `Natural gas - proved reserves(cu m)`=col_double(),
  `Oil - consumption(bbl/day)`=col_double(),
  `Oil - exports(bbl/day)`=col_double(),
  `Oil - imports(bbl/day)`=col_double(),
  `Oil - production(bbl/day)`=col_double(),
  `Oil - proved reserves(bbl)`=col_double(),
  `Population`=col_double(),
  `Railways(km)`=col_double(),
  `Public debt(% of GDP)`=col_double(),
  `Reserves of foreign exchange & gold`=col_double(),
  `Telephones - main lines in use`=col_double(),
  `Telephones - mobile cellular`=col_double(),
  `Total fertility rate(children born/woman)`=col_double(),
  `Unemployment rate(%)`=col_double()
))
## Warning: 44 parsing failures.
## row                                col expected actual           file
##   1 Area(sq km)                        a double double 'factbook.csv'
##   1 Birth rate(births/1000 population) a double double 'factbook.csv'
##   1 Current account balance            a double double 'factbook.csv'
##   1 Death rate(deaths/1000 population) a double double 'factbook.csv'
##   1 Debt - external                    a double double 'factbook.csv'
## ... .................................. ........ ...... ..............
## See problems(...) for more details.
factbook<-factbook[2:264,]
colnames(factbook)
##  [1] "Country"                                       
##  [2] "Area(sq km)"                                   
##  [3] "Birth rate(births/1000 population)"            
##  [4] "Current account balance"                       
##  [5] "Death rate(deaths/1000 population)"            
##  [6] "Debt - external"                               
##  [7] "Electricity - consumption(kWh)"                
##  [8] "Electricity - production(kWh)"                 
##  [9] "Exports"                                       
## [10] "GDP"                                           
## [11] "GDP - per capita"                              
## [12] "GDP - real growth rate(%)"                     
## [13] "HIV/AIDS - adult prevalence rate(%)"           
## [14] "HIV/AIDS - deaths"                             
## [15] "HIV/AIDS - people living with HIV/AIDS"        
## [16] "Highways(km)"                                  
## [17] "Imports"                                       
## [18] "Industrial production growth rate(%)"          
## [19] "Infant mortality rate(deaths/1000 live births)"
## [20] "Inflation rate (consumer prices)(%)"           
## [21] "Internet hosts"                                
## [22] "Internet users"                                
## [23] "Investment (gross fixed)(% of GDP)"            
## [24] "Labor force"                                   
## [25] "Life expectancy at birth(years)"               
## [26] "Military expenditures - dollar figure"         
## [27] "Military expenditures - percent of GDP(%)"     
## [28] "Natural gas - consumption(cu m)"               
## [29] "Natural gas - exports(cu m)"                   
## [30] "Natural gas - imports(cu m)"                   
## [31] "Natural gas - production(cu m)"                
## [32] "Natural gas - proved reserves(cu m)"           
## [33] "Oil - consumption(bbl/day)"                    
## [34] "Oil - exports(bbl/day)"                        
## [35] "Oil - imports(bbl/day)"                        
## [36] "Oil - production(bbl/day)"                     
## [37] "Oil - proved reserves(bbl)"                    
## [38] "Population"                                    
## [39] "Public debt(% of GDP)"                         
## [40] "Railways(km)"                                  
## [41] "Reserves of foreign exchange & gold"           
## [42] "Telephones - main lines in use"                
## [43] "Telephones - mobile cellular"                  
## [44] "Total fertility rate(children born/woman)"     
## [45] "Unemployment rate(%)"

Team Question

We were hired by an public health specialist to determine the relationship between the level of industrialization (measured by electricity, natural gas, and oil consumption as well as industrial production growth rate) and the various measures of population health, such as death rate, infant mortality and life expectancy. This will help illuminate the relationship between industrialization and health. Understanding this relationship will help to guide policies about how indistrialization is encouraged or discouraged in different areas in order to increase the general health of the populace. Obviously, a public health specialist would find this question extremely important in this day and age, as our world is becoming increasingly industrialized and reliant on production of various methods of power creation, and any positive or negative trend between levels of indutrialization and general population health could be extremely influential in determining which regions of industrial production and consumption should be supported and accelerated in developing and developed countries. These findings would also be incredibly influential in determining public policy in developed countries, as if any findings turn out to be conclusive, it is important that all government officials are aware so that the best policies can be put in place in order to both help the plant’s general health as well as each country’s general population health. However, without the help of a public health specialist, who has the power to work with lobbyists to make publicly elected officials aware of these issues, the data may simply fall on deaf ears and lead to little or no impact in governmental policymaking and decisions.

Electricity Production

factbook1_1 <- factbook %>% filter(!is.na(`Death rate(deaths/1000 population)`) & !is.na(`Electricity - production(kWh)`))
ggplot(data = factbook1_1) + geom_point(aes(x= `Electricity - production(kWh)`, y = `Death rate(deaths/1000 population)`), color = "blue") + labs(title = "Death Rate vs. Electricity Production", x = "Electricity Production (kWH)", y = "Death rate (deaths/1000 population)")

factbook1_2 <- factbook %>% filter(!is.na(`Infant mortality rate(deaths/1000 live births)`) & !is.na(`Electricity - production(kWh)`))
ggplot(data = factbook1_2)+ geom_point(aes(x= `Electricity - production(kWh)`, y  = `Infant mortality rate(deaths/1000 live births)`), color = "red")+ labs(title = "Infant Mortality Rate vs. Electricity Production", x = "Electricity Production (kWH)", y = "Infant Mortality Rate (deaths/1000 live births)")

factbook1_3 <- factbook %>% filter(!is.na(`Life expectancy at birth(years)`) & !is.na(`Electricity - production(kWh)`))
ggplot(data = factbook1_3)+ geom_point(aes(x= `Electricity - production(kWh)`, y  = `Life expectancy at birth(years)`), color = "green") + labs(title = "Life Expectancy vs. Electricity Production", x = "Electricity Production (kWH)", y = "Life Expectancy at Birth (years)")

Electricity Consumption

factbook2_1 <- factbook %>% filter(!is.na(`Death rate(deaths/1000 population)`) & !is.na(`Electricity - consumption(kWh)`))
ggplot(data = factbook2_1) + geom_point(aes(x= `Electricity - consumption(kWh)`, y = `Death rate(deaths/1000 population)`), color = "blue") + labs(title = "Death rate vs. Electricity Consumption", x = "Electricity Consumption (kWH)", y = "Death rate (deaths/1000 population)")

factbook2_2 <- factbook %>% filter( !is.na(`Infant mortality rate(deaths/1000 live births)`) & !is.na(`Electricity - consumption(kWh)`))
ggplot(data = factbook2_2)+ geom_point(aes(x= `Electricity - consumption(kWh)`, y  = `Infant mortality rate(deaths/1000 live births)`), color = "red") + labs(title = "Death rate vs. Electricity Consumption", x = "Electricity Consumption (kWH)", y = "Infant Mortality Rate (deaths/1000 live births)")

factbook2_3 <- factbook %>% filter(!is.na(`Life expectancy at birth(years)`) & !is.na(`Electricity - consumption(kWh)`))
ggplot(data = factbook2_3)+ geom_point(aes(x= `Electricity - consumption(kWh)`, y  = `Life expectancy at birth(years)`), color = "green")+ labs(title = "Life Expectancy vs. Electricity Consumption", x = "Electricity Consumption (kWH)", y = "Life Expectancy at Birth (years)")

Findings: Electricity Production & Consumption

As seen in the graphs above, there isn’t a large correlation between electrictiy prodcution and consumption vs. the three measures used above for displaying population health. For example, in terms of death rate per 1000 population, the death rate varies wildly along the y-axis, showing both the minimum death rate seen in the data with esentially 0 electricity production or consumption, as well as showing the maximum death rate seen in the data set, also at basically 0 electricity production and consumption. While the data does seem to display a positive trend towards increasing life expectancy and decreasing mortality rates as electricity production and consumption increases, due to the overwhelming lack of trend of data at 0 electricity production and consumption, it is virtually impossible to say whether this positive trend with increasing electricity rates is meaningful or just caused by other factors that were not accounted for in the above graphs.

Natural Gas Production

factbook3_1 <- factbook %>% filter(!is.na(`Death rate(deaths/1000 population)`) & !is.na(`Natural gas - production(cu m)`))
ggplot(data = factbook3_1) + geom_point(aes(x= `Natural gas - production(cu m)`, y = `Death rate(deaths/1000 population)`), color = "blue") + labs(title = "Death Rate vs. Natural Gas Production", x = "Natural Gas Production (cu m)", y = "Death Rate (deaths/1000 population)")

factbook3_2 <- factbook %>% filter( !is.na(`Infant mortality rate(deaths/1000 live births)`) & !is.na(`Natural gas - production(cu m)`))
ggplot(data = factbook3_2) + geom_point(aes(x= `Natural gas - production(cu m)`, y  = `Infant mortality rate(deaths/1000 live births)`), color = "red") + labs(title = "Infant Mortality Rate vs. Natural Gas Production", x = "Natural Gas Production (cu m)", y = "Infant Mortality Rate (deaths/1000 live births)")

factbook3_3 <- factbook %>% filter(!is.na(`Life expectancy at birth(years)`) & !is.na(`Natural gas - production(cu m)`))
ggplot(data = factbook3_3)+ geom_point(aes(x= `Natural gas - production(cu m)`, y  = `Life expectancy at birth(years)`), color = "green")+ labs(title = "Life Expectancy vs. Natural Gas Production", x = "Natural Gas Production  (cu m)", y = "Life Expectancy at Birth (years)")

Natural Gas COnsumption

factbook4_1 <- factbook %>% filter(!is.na(`Death rate(deaths/1000 population)`) & !is.na(`Natural gas - consumption(cu m)`))
ggplot(data = factbook4_1) + geom_point(aes(x= `Natural gas - consumption(cu m)`, y = `Death rate(deaths/1000 population)`), color = "blue") + labs(title = "Death Rate vs. Natural Gas Consumption", x = "Natural Gas Consumption (cu m)", y = "Death Rate (deaths/1000 population)")

factbook4_2 <- factbook %>% filter( !is.na(`Infant mortality rate(deaths/1000 live births)`) & !is.na(`Natural gas - consumption(cu m)`))
ggplot(data = factbook4_2) + geom_point(aes(x= `Natural gas - consumption(cu m)`, y  = `Infant mortality rate(deaths/1000 live births)`), color = "red") + labs(title = "Infant Mortality Rate vs. Natural Gas Consumption", x = "Natural Gas Consumption(cu m)", y = "Infant Mortality Rate (deaths/1000 live births)")

factbook4_3 <- factbook %>% filter(!is.na(`Life expectancy at birth(years)`) & !is.na(`Natural gas - consumption(cu m)`))
ggplot(data = factbook4_3)+ geom_point(aes(x= `Natural gas - consumption(cu m)`, y  = `Life expectancy at birth(years)`), color = "green")+ labs(title = "Life Expectancy vs. Natural Gas Consumption", x = "Natural Gas Consumption (cu m)", y = "Life Expectancy at Birth (years)")

Findings: Natural Gas Production and Consumption

Again, the graphs above for natural gas production and consumption vs. the three factors used to display population health show a minimal trend towards increasing population health with increased production or consumption. While the life expectancy graphs do seem to suggest that life expectancy does increase on average with higher levels of production and consumption, there are still more points of data along the y-axis at higher levels of life expectancy than there are for any other level of production, showing again that there are likely other factors causing life expectancy to be higher in these countries instead of higher levels of industrialization in the natural gas industry. This point is further emphasized by looking at the death rate graphs for natural gas production and consumption, which seem to indicate that death rate is likely to increase as the natural gas industry in the given country expands, seemingly contradicting the earlier indicated trend by life expectancy at birth.

Oil Production

factbook5_1 <- factbook %>% filter(!is.na(`Death rate(deaths/1000 population)`) & !is.na(`Oil - production(bbl/day)`))
ggplot(data = factbook5_1) + geom_point(aes(x= `Oil - production(bbl/day)`, y = `Death rate(deaths/1000 population)`), color = "blue") + labs(title = "Death Rate vs. Oil Production", x = "Oil Production (bbl/day)", y = "Death Rate (deaths/1000 population)")

factbook5_2 <- factbook %>% filter(!is.na(`Infant mortality rate(deaths/1000 live births)`) & !is.na(`Oil - production(bbl/day)`))
ggplot(data = factbook5_2)+ geom_point(aes(x= `Oil - production(bbl/day)`, y  = `Infant mortality rate(deaths/1000 live births)`), color = "red") + labs(title = "Infant Mortality Rate vs. Oil Production", x = "Oil Production (bbl/day)", y = "Infant Mortality Rate (deaths/1000 live births)")

factbook5_3 <- factbook %>% filter(!is.na(`Life expectancy at birth(years)`) & !is.na(`Oil - production(bbl/day)`))
ggplot(data = factbook5_3)+ geom_point(aes(x= `Oil - production(bbl/day)`, y  = `Life expectancy at birth(years)`), color = "green")+ labs(title = "Life Expectancy vs. Oil Production", x = "Oil Production (bbl/day)", y = "Life Expectancy at Birth (years)")

Oil Consumption

factbook6_1 <- factbook %>% filter(!is.na(`Death rate(deaths/1000 population)`) & !is.na(`Oil - consumption(bbl/day)`))
ggplot(data = factbook6_1) + geom_point(aes(x= `Oil - consumption(bbl/day)`, y = `Death rate(deaths/1000 population)`), color = "blue") + labs(title = "Death Rate vs. Oil Consumption", x = "Oil Consumption (bbl/day)", y = "Death Rate (deaths/1000 population)")

factbook6_2 <- factbook %>% filter(!is.na(`Infant mortality rate(deaths/1000 live births)`) & !is.na(`Oil - consumption(bbl/day)`))
ggplot(data = factbook6_2) + geom_point(aes(x= `Oil - consumption(bbl/day)`, y  = `Infant mortality rate(deaths/1000 live births)`), color = "red") + labs(title = "Infant Mortality Rate vs. Oil Consumption", x = "Oil Consumption (bbl/day)", y = "Infant Mortality Rate (deaths/1000 live births)")

factbook6_3 <- factbook %>% filter(!is.na(`Life expectancy at birth(years)`) & !is.na(`Oil - consumption(bbl/day)`))
ggplot(data = factbook6_3)+ geom_point(aes(x= `Oil - consumption(bbl/day)`, y  = `Life expectancy at birth(years)`), color = "green")+ labs(title = "Life Expectancy vs. Oil Consumption", x = "Oil Consumption (bbl/day)", y = "Life Expectancy at Birth (years)")

Findings: Oil Production and Consumption

As seen in the two previous tests of industrialization vs. population health, there is no trend towards increasing population health as the oil industry expands in any given country. The data for oil production and consumption overwhelmingly suggests that there are other factors that are much more influential in increasing population health vs higher levels of oil production and consumption.

Industrial Growth Rate

factbook7_1 <- factbook %>% filter(!is.na(`Death rate(deaths/1000 population)`) & !is.na(`Industrial production growth rate(%)`))
ggplot(data = factbook7_1) + geom_point(mapping = aes(x =  `Industrial production growth rate(%)`, y = `Death rate(deaths/1000 population)`), color = "blue") +geom_smooth(mapping = aes(x =  `Industrial production growth rate(%)`, y = `Death rate(deaths/1000 population)`))+ labs(title = "Death Rate vs. Industrial Production Growth Rate", x = "Industrial Production Growth Rate (%)", y = "Death Rate (deaths/1000 population)")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

factbook7_2 <- factbook %>% filter(!is.na(`Infant mortality rate(deaths/1000 live births)`) & !is.na(`Industrial production growth rate(%)`))
ggplot(data = factbook7_2) + geom_point(mapping = aes(x =  `Industrial production growth rate(%)`, y = `Infant mortality rate(deaths/1000 live births)`), color = "red")+ geom_smooth(mapping = aes(x =  `Industrial production growth rate(%)`, y = `Infant mortality rate(deaths/1000 live births)`), color = "red")+ labs(title = "Infant Mortality Rate vs. Industrial Production Growth Rate", x = "Industrial Production Growth Rate (%)", y = "Infant Mortality Rate (deaths/1000 live births)")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

factbook7_3 <- factbook %>% filter(!is.na(`Life expectancy at birth(years)`) & !is.na(`Industrial production growth rate(%)`))
ggplot(data = factbook7_3)+ geom_point(aes(x= `Industrial production growth rate(%)`, y  = `Life expectancy at birth(years)`), color = "green")+ geom_smooth(aes(x= `Industrial production growth rate(%)`, y  = `Life expectancy at birth(years)`), color = "green")+ labs(title = "Life Expectancy vs. Industrial Production Growth Rate", x = "Industrial Production Growth Rate (%)", y = "Life Expectancy at Birth (years))")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Findings: Industrial Production Growth Rate

Industrial production growth rate shows absolutely zero correlation with any of the three factors of population health. The data is scattered seemingly at random throughout all three graphs, and the trendlines all suggest that no clear relationship is being displayed by this scattered data.

Overall Conclusion and Recommendation:

All of the above graphs suggest little to no trend between levels of industrialization and overall population health. In some cases, there may be a minimal positive trend towards increasing a specific part of population health with increasing levels of industrial production in specific areas, but these positive trends are often contradicted by negative trends by other factors of population health, suggesting that these trends are more than likely outliers in the overall set of data. It is clear from all the above graphs that any increase in population health with incresaing levels of industrialization is caused by other factors not requested to be studied by the public health specialist. Therefore, we would recommend that no action be taken from the above data. There is simply no way to tell that any relationship exists between levels of industrialization and population health, and it would be incredibly foolhardy to then recommend action knowing this. We would, however, recommend that the public health specialist explore other factors that may have a greater impact on population health, such as level of GDP per capita and other factors related to economic prosperity in a country, as a higher level of economic prosperity is much more likely to indicate a country’s successin development, and is therefore much more likely to correlate stringly with population health. Industrialization is simply too varied in its levels across both developing and developed countries, and is just not a reliable measure of a country’s prosperity and overall population health.

Brief Description of the dataset

The datset being used is a factbook with data on 160 countries, and ~100 non-country political entities. Across these countries it has 44 measurements, including debt, resource consumtion/production, internet users, life expectancy, fertility rates, and many more.

These data comes from the CIA, however they don’t provide an csv file download, so the actual csv file was obtained from https://perso.telecom-paristech.fr/eagan/class/igr204/datasets

Codebook:

Braden Griebel Individual Section:

Question: What is the relationship between length of highways and railroads, when controlling for land area and wealth measured by GDP per capita, and what affect does the ratio have with the oil consumption of the country?

Note that the countries were divided between the three categories based on the 33rd and 66th percentiles (1/3 and 2/3) of the Area of a country. So small countries were those with less than 3962.8 sq km, medium were between 3962.8 and 184848.8, and large were any countries that were larger than that. Small countries were mostly ancient cities and other outliers, with large numbers of NA for all the variables. Thus the data for small countries is not very useful, and mostly just eliminated from the rest of the data.

bfactbook<-factbook%>%mutate("ratio"=`Railways(km)`/`Highways(km)`)%>%
  mutate("GDP_fac"=
  ifelse(`GDP - per capita`<=10000,0,
         ifelse(`GDP - per capita`>10000&`GDP - per capita`<=20000,10,
         ifelse(`GDP - per capita`>20000&`GDP - per capita`<=30000,20,
         ifelse(`GDP - per capita`>30000&`GDP - per capita`<=40000,30,
         ifelse(`GDP - per capita`>40000&`GDP - per capita`<=50000,40,
         ifelse(`GDP - per capita`>50000,50,NA)))))))%>%
  mutate("area_fac"=
           ifelse(`Area(sq km)`<= 3962.8,"small",
                  ifelse(`Area(sq km)`> 3962.8&`Area(sq km)`<=184848.8,"medium",
                         ifelse(`Area(sq km)`>184848.8,"large",NA)))) 

large<-bfactbook%>%filter(area_fac=="large")
medium<-bfactbook%>%filter(area_fac=="medium")
small<-bfactbook%>%filter(area_fac=="small")
ggplot(large)+geom_jitter(aes(`Highways(km)`,`Railways(km)`,color=as.factor(GDP_fac)))+
  facet_wrap(~area_fac)+
  geom_smooth(method=lm,aes(`Highways(km)`,`Railways(km)`,color=as.factor(GDP_fac)),se=F)+coord_cartesian(xlim=c(184848.8,2e+06))+
  labs(title='Highways and Railroad in Large countries',color="Categorical GDP per capita in Thousands",
       caption="GDP per capita between 0 and 10000 represented by 0, between 10000 and 20000 by a 10 etc")
## Warning: Removed 15 rows containing non-finite values (stat_smooth).
## Warning: Removed 15 rows containing missing values (geom_point).

ggplot(medium)+geom_jitter(aes(`Highways(km)`,`Railways(km)`,color=as.factor(GDP_fac)))+
  facet_wrap(~area_fac)+
  geom_smooth(method=lm,aes(`Highways(km)`,`Railways(km)`,color=as.factor(GDP_fac)),se=F)+coord_cartesian(xlim=c(3962.8,184848.8))+
  labs(title='Highways and Railroad in Medium countries',color="Categorical GDP per capita in Thousands",
       caption="GDP per capita between 0 and 10000 represented by 0, between 10000 and 20000 by a 10 etc")
## Warning: Removed 30 rows containing non-finite values (stat_smooth).
## Warning: Removed 30 rows containing missing values (geom_point).

ggplot(small)+geom_jitter(aes(`Highways(km)`,`Railways(km)`,color=as.factor(GDP_fac)))+
  facet_wrap(~area_fac)+
  geom_smooth(method=lm,aes(`Highways(km)`,`Railways(km)`),se=F)+#coord_cartesian(xlim=c(0,3962.8))+
  labs(title='Highways and Railroad in Small countries',color="Categorical GDP per capita in Thousands",
       caption="GDP per capita between 0 and 10000 represented by 0, between 10000 and 20000 by a 10 etc")
## Warning: Removed 84 rows containing non-finite values (stat_smooth).
## Warning: Removed 84 rows containing missing values (geom_point).

bfactbook<-bfactbook%>%mutate("oil_percapita"=`Oil - consumption(bbl/day)`/Population)
ggplot(bfactbook,aes(ratio,oil_percapita))+geom_point()+geom_smooth(se=F,method=lm)+coord_cartesian(xlim=c(0,.2))+
labs(title="Oil Consumption vs. Ratio of Railways to Highways",x="Ratio of Railroad to Highways",y="Per Capita Oil Consumed (bbl/day)")  
## Warning: Removed 130 rows containing non-finite values (stat_smooth).
## Warning: Removed 130 rows containing missing values (geom_point).

bfb<-bfactbook%>%filter(!is.na(ratio)&!is.na(`oil_percapita`))

Correlation Between Ratio of Railways to Highways and Oil consumption in bbl/day

cor(bfb$ratio,bfb$oil_percapita)
## [1] -0.1165541

Findings:

From the graph directly above, and the correlation coefficient, there is no significant correlation between the ratio of railroads to higways. This could mean a variety of things, such as the effeciency of trains and highways not being significantly different, or usage rates varying not being reflected by size of the rail network. Either way this means it is not particularly useful to build more railways per km of highway in reducing oil use. From the first two graphs (again the third is for small countries and other territories, and in fact most of the countries have NAs across most of their variables), it can be seen that across GDP and country size there is a positive correlation between kilometers of highways in a country and kilometers of railways in a country. This makes sense, as both are correlated to country size. The slope of the lines varies between the GDP levels, but there doesn’t seem to be any consistant trends about what level of wealth leads to higher or lower slopes of the linear model. I would recomend an analyse that takes into account actual usage statistics of the railways, since that would allow for a better understanding of if trains are a good investment when seeking to reduce teh usage of fossil fuels. Additionally, data on the efficiency of trains versus trucks/cars in conveying people and goods should be undertaken in order to fully understand the relationship between the proportion of train usage and oil consumption.

Ethics:

This analyse could hurt train companies that advertise their services based on the environmental impact of driving, since there seems to be no difference between countries with a high proportion of train lines to those with a low proportion of train lines. In the same way it could help car manufacturers by showing that more trains don’t necesarily reduce a countries oil consumption. Additionally, since the analysis doesn’t reflect actual train usage (since that was not in the dataset) I had to use a proxy for the usage (i.e. more kilometers of raillines means more train usage), which isn’t neccesarily well corelated with actual rail usage, which could mean that this analyses about the use of trains and oil consumption is inaccurate. This would in turn mean that people concerned about the environment who take trains for this reason might stop, which could cause an increase in oil consumption and thus CO2 released into the atmosphere, or that governments seeking to reduce carbon emissions would not invest in trains, so ultimately this analysis could cause the entire world to burn.

Luke Fanning Individual Section

Question: How does the rate of military expenditures, in terms of percent of GDP, affect various measures of economic success, such as GDP growth rate, unemployment rate, public debt, as well as various import and export variables?

l_1factbook <- factbook %>% filter(!is.na(`Military expenditures - percent of GDP(%)`) & !is.na(`GDP - real growth rate(%)`))

ggplot(data = l_1factbook, mapping = aes(x = `Military expenditures - percent of GDP(%)`, y = `GDP - real growth rate(%)`)) + geom_point() + geom_smooth() + labs(title = "Miltary Expenditures vs. GDP growth rate", x = "Miltary Expenditures as a percent of GDP", y = " GDP - Real Growth Rate")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

l_2factbook <- factbook %>% filter(!is.na(`Military expenditures - percent of GDP(%)`) & !is.na(`Unemployment rate(%)`))

ggplot(data = l_2factbook, mapping = aes(x = `Military expenditures - percent of GDP(%)`, y = `Unemployment rate(%)`)) + geom_point() + geom_smooth() + labs(title = "Military Expenditures vs. Unemployment Rate", x = "Military Expenditures as a percent of GDP", y = "Unemployment Rate")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

l_3factbook <- factbook %>% filter(!is.na(`Military expenditures - percent of GDP(%)`) & !is.na(`Public debt(% of GDP)`))

ggplot(data = l_3factbook, mapping = aes(x = `Military expenditures - percent of GDP(%)`, y = `Public debt(% of GDP)`)) + geom_point() + geom_smooth() + labs(title = "Military Expenditures vs. Public Debt", x = " Military Expenditures as a percent of GDP", y = "Public Debt as a percent of GDP")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

l_4factbook <- factbook %>% filter(!is.na(`Military expenditures - percent of GDP(%)`) & !is.na(`Imports`))

ggplot(data = l_4factbook, mapping = aes(x = `Military expenditures - percent of GDP(%)`, y = `Imports`)) + geom_point() + geom_smooth() + labs(title = "Military Expenditures vs. Imports", x = "Military Expenditures as a percent of GDP", y = " Number of Imports")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

l_5factbook <- factbook %>% filter(!is.na(`Military expenditures - percent of GDP(%)`) & !is.na(`Exports`))

ggplot(data = l_5factbook, mapping = aes(x = `Military expenditures - percent of GDP(%)`, y = `Exports`)) + geom_point() + geom_smooth() + labs(title = "Military Expenditures vs. Exports", x = "Military Expenditures as a percent of GDP", y = "Number of Exports")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Findings:

The graphs above clearly show that there is no correlation between military spending as a function of GDP and any factor of economic success. This then means that countries looking to boost their economies in this day and age should focus on spending in other areas besides the military. This is especially relevant information to have in the US, where military spending has never been higher, yet we teeter closer and closer to a recession. One important thing to keep in mind with this data, however, is while there is no clear correlation between military expenditures and overall economic success, there is a correlation with decreased economic variability and military expenditures. This means that while you cannot necessary boost a country;s economy by spending money on the military, you can use military spending as a stabilizing factor to help reduce negative externalities of spending in other areas of the economy.

Ethical Implications:

Ethically, this can benefit governments who heed the advice of having a balanced spending plan in terms of expenditures as a percent of GDP. Many countires would be better off with less spending on thier military (such as the US), as it is clear from the graphs above (especially that of miltary expenditures vs. public debt) that at a certain point, increased military expenditures are actually related to negative economic factors rather than economic upturn. However, this data can also be extremely harmful to national security of many smaller countries, as if all countries were to increase military expenditures to a point of economic stabilization, the world would be a much more dangerous place to live in due to the volatility of various political figures in this world.

Ahyo Falick Individual Section:

Question 1 How does the percent of people in the labor force compared with the total population affect the industrial production growth rate? This is an interesting question because it might show correlations between a larger labor force and a high industrial production growth rate. The answer to this question could point to an ideal percent of workers in a country to optimize industrial production. From the data, it can be analyzed what percent of the population works for more industrialized countries and what percent works for less industrialized countries.

labor <- factbook %>% select("Country","Labor force","Population","Industrial production growth rate(%)", "GDP - real growth rate(%)") %>% mutate(factbook$`Labor force` / factbook$Population * 100)
names(labor) <- c("Country", "Labor force", "Population", "Industrial production growth rate(%)", "GDP Growth Rate (%)", "Labor force (%)")
labor <- labor %>% filter(!is.na(labor$`Industrial production growth rate(%)`))
labor
## # A tibble: 164 x 6
##    Country `Labor force` Population `Industrial pro… `GDP Growth Rat…
##    <chr>           <dbl>      <dbl>            <dbl>            <dbl>
##  1 Albania       1090000    3563112              3.1              5.6
##  2 Algeria       9910000   32531853              6                6.1
##  3 Angola        5410000   11190786              1               11.7
##  4 Anguil…          6049      13254              3.1              2.8
##  5 Antigu…         30000      68722              6                3  
##  6 Argent…      15040000   39537943             12                8.3
##  7 Armenia       1400000    2982904             15                9  
##  8 Austra…      10350000   20090437              1.9              3.5
##  9 Austria       3450000    8184691              3.3              1.9
## 10 Azerba…       5090000    7911974              4                9.8
## # … with 154 more rows, and 1 more variable: `Labor force (%)` <dbl>
ggplot(labor) + geom_point(mapping = aes(x = labor$`Labor force (%)`, y = labor$`Industrial production growth rate(%)`), color = "blue") + geom_smooth(mapping = aes(x = labor$`Labor force (%)`, y = labor$`Industrial production growth rate(%)`), color = "red") + labs(title = "Labor Force vs. Industrial Production Growth Rate", x = "Labor Force (%)", y = "Industrial Production Growth Rate (%)")

ggplot(labor) + geom_point(mapping = aes(x = labor$`Labor force (%)`, y = labor$`GDP Growth Rate (%)`), color = "orange", size = 1) + geom_smooth(mapping = aes(x = labor$`Labor force (%)`, y = labor$`GDP Growth Rate (%)`), color = "magenta") + labs(title = "Labor Force vs. GDP Growth Rate (%)", x = "Labor Force (%)", y = "GDP Growth Rate (%)") + ylim(-5, 20)

Findings:

The first graph compares the Labor Force percentage of a country with the its industrial production growth rate. The individual data points are depicted in blue and an attempted trend line is in red. From this data, it can be concluded that, to a certain extent, as the labor force increases, so does the growth rate of industrial production. The trend line steadily rises until nearly a 50% labor force, where the line begins to decrease. This demonstrates that there seems to be an ideal percentage of workers a country should have for the most industrial growth.

The second plot compares an increasing labor force percentage with a country’s gross domestic product (GDP). As with the industrial production growth rate, a country’s GDP growth rate increases steadily as the labor force percentage increases. The trendline for this graph increases throughout the whole graph with a slight decrease at 50%. The conclusion can therefore be drawn that the GDP growth rate has a positive correlation with an increased labor force. Furthermore, this shows that as long as the labor force is increasing, so will the GDP growth rate.

Ethics Reflection:

This data can benefit countries with lower growth rates of industrial production and GDP by demonstrating that these rates can increase by establishing a larger labor force.

Andrew Duffy Individual Section

Question How does a countries life expectancy at birth vary based on a countries GDP per Capita? I find this to be interesting because there are many variables you can compare to GDP per capita to make assumptions of how money affect people’s lives in different ways. Especially, how do people in developing countries have different lives from people living in developed countries? But, the most valuable thing for us humans is life, So I wonder how does one’s wealth and income affect their life expectancy? Do people in developing countries live shorter or longer?

Pre-Analysis First I want to look at a few other variables before plotting life expectancy vs. gdp per capita as I would like to find other similar variables that also have a correlation to strengthen my findings. Since I am concerned with how one’s income affects their length of life, I should also look at death rate, and GDP growth by percentage. My reasoning for using these 2 variables is death rate is an actual look at the rate in which people die in a certain country, because life expectancy is at birth and there is a lot of things that can happen between birth and death to affect the length of a person’s life. I also want to look at GDP growth because I would like to see if countries that are developing at a fast pace have a correlation with life expectancy.

df <- factbook %>% filter(!is.na(`GDP - per capita`) & !is.na(`Death rate(deaths/1000 population)`))
df <- df %>% arrange(`GDP - per capita`)

ggplot(data = df) + geom_smooth(aes(x = `Death rate(deaths/1000 population)`, y = `GDP - per capita`), se = FALSE) + ggtitle("GDP per capita (US$) vs. Death Rate (deaths/1000 population)")

df <- factbook %>% filter(!is.na(`GDP - real growth rate(%)`) & !is.na(`Life expectancy at birth(years)`))
df <- df %>% arrange(`GDP - real growth rate(%)`)

ggplot(data = df) + geom_smooth(aes(x = `Life expectancy at birth(years)`, y = `GDP - real growth rate(%)`), se = FALSE) + ggtitle("GDP growth rate (%) vs. Life Expectancy (at birth)")

df <- factbook %>% filter(!is.na(`GDP - per capita`) & !is.na(`Life expectancy at birth(years)`))
df <- df %>% arrange(`GDP - per capita`)

ggplot(data = df) + geom_smooth(aes(x = `Life expectancy at birth(years)`, y = `GDP - per capita`, color = 'red'), se = FALSE) + ggtitle("GDP per capita (US$) vs. Life Expectancy (at birth)")

Findings

While there is not much of a correlation between life expectancy and GDP growth, as well as GDP per capita and death rate, I wasn’t very surprised by this, because a lot of things can affect death rate/life expectancy on a small scale to create outliers. In fact, it seems that in the death rate vs. GDP per capita plot there seems to be an uptick at the end of the graph in which countries with a relatively high GDP experience the highest death rate, as well as the GDP growth vs. life expectancy where it seems countries with lower GDP growth rates have a high life expextancy. However, there is a very strong correlation between GDP per capita and life expectancy where the higher the countries income per resident, the higher the life expectancy.

Ethical Implications

The plot GDP per capita vs. Life Expectnacy shows a clear correlation between the 2. This implicates that the richer you are, the longer you will live. Not only that, but the line equation seems to be of exponential growth, meaning there is a huge difference in life expectancy vs. someone who makes (USD) 10,000 per year versus someone who makes $80,0000 per year. This is extremely valuable information, because a large corporation who has insight on this correlation can tailor their services knowing people in certain countries will live (on average) shorter or longer than people in other countries. This statistic could possibly create a lot of room for profit if acted upon correctly, as well as the moral implications of this. In undeveloped countries, it is quite sad knowing the average person will live almost less than half as long as the richest countries, simply due to medical access and the ability to afford proper health and medications one might need. In conclusion, as a species we should work to create less class diversity, or make the poorest countries in the world wealthier to let their populations live longer.

Individual Contributions

Luke Fanning: I did my individual section and conclusion, answering my question through the use of geom_point and geom_smooth. I also used the dplyr function filter, allowing me to remove NA values from the dataset in order to create a more accurate depiction of the dataset. I also wrote out the team question, adding the necessary parts about implications of this data for the domain expert and deescribing why this dataset is just generally important in this day and age. I then completed all of the graphs for the team section, and wrote the findings for each section as well as the overall conclusion and recommendation.

Braden Griebel: For the team section I imported the data set and used the col_double function to parse all the columbns as a number, and then removed the NA row at the top that represented metadata that col_double parsed as an NA (since they were character strings) I also wrote out the codebook and dataset description. For the individual section I added three new variables using the mutate function. The first was the ratio of kilometers of railways to kilometers of highways to analyse how these were correlated with oil consumption. The second and third new variables were variables coding for the size of a countries area or GDP. The cut offs for this were found using quantile function to roughly equally split the data between the categories. I then used ggplot to plot the data, using color to split the data along the GDP categories. I initially used a facet_wrap function to split the data along the area categories, but this didn’t allow enough control over the scale of the axis, so instead I filtered the data using the filter function and created seperate graphs. Finally, I used ggplot again to graph oil consumption vs the ratio I had created earlier, using geom_point as well geom_smooth with a linear model to show the trend. I added labels to make everything easier to read.

Ahyo Falick: In order to complete my individual section, I first selected only the variables needed to compare labor force to growth rates. These variables were the country, the labor force, the population, the industrial production growth rate, and the GDP growth rate. In order to find the labor force as a percentage, I then divided the labor force of a country by the population of that country. Finally, I created two separate plots using geom_point and geom_smooth to display the data cleanly and show its trends.

Andrew Duffy: To complete my portion of this lab, I had to manipulate the factbook dataset to find a correlation between wealth and length of life. I did this with a few data frame manipulation functions, as well as using geom_smooth to graph my findings. I felt geom_smooth was the best plot to use because I was curious to see if any correlation had a graphical relation to its line equation. For example, my Life expectancy vs. GDP per capita plot had a very strong exponential function correlation between the two said variables. It is important to use geom_smooth for this becuase it is very easy to tell if your graph has a true correlation. For this dataset, there is a ton of data so I figured it was ok to inherently ignore outliers by using geom_smooth instead of geom_point or some other plot function.